Pre-processing

  1. Adapters were trimmed using cutadapt v1.16
  2. Gene expression was quantified using salmon v1.3.0
  3. TPMs were obtained for the genes using tximport 1.20.0
  4. Sample HTB314 was removed from this analysis due to its size factor being very different from others and leading to abberant results.
library(dplyr)
library(ggplot2)
library(DESeq2)
library(tximport)
library(readr)
library(tximportData)
library(readxl)
library(knitr)
library(tidyverse)
library(pheatmap)
library(RColorBrewer)
library(viridis)
library(ggrepel)
library(EnhancedVolcano)
library(fgsea)
library(limma)

Generate a summary table of the samples sequenced and their sequencing and alignment metrics following removal of sample HTB314

Summary of Data Metrics
Sample patient reads %Q30 Duplication rate % reads with adapter STAR alignment number percent aligned splices annotated salmon mapping
Bulk MDS268 31441538 93 24 2.3 25549835 81.26 5546270 58.2075
Bulk MDS280 28794421 93 24 2.3 23317205 80.98 5115546 58.8614
CD123+ MDS268 32307881 93 27 3.3 26243441 81.23 7290861 64.2150
CD123+ MDS280 23745205 93 34 2.6 21035358 88.59 5668653 71.3245
CD123- MDS268 27917999 93 26 2.7 22835189 81.79 5793322 63.9284
CD123- MDS280 24767573 93 32 2.6 22467937 90.72 4773823 62.7193

Import and format the data for DeSeq2

Heatmap using the spearman method as well as a correlation heatmap

PCA plot

PC1 vs PC2

## 3. Run Differential Expression testing using DESeq2 and Calculate Gene Set Enrichment ## Compare 123pos vs 123neg, 123neg vs bulk, and 123pos vs bulk #### sig = padj <0.01 and abs(l2fc) >0.5

## [1] TRUE
## [1] TRUE
## [1] 606
## [1] 1608
## [1] 1165
## [1] 1332
## [1] 2720
## [1] 2051
## [1] 564
## [1] 153

Volcano Plot

MA plots

## Warning: Unknown or uninitialised column: `percentrRNA`.

GSEA analysis